FIELDS(3) FIELDS(3) NNAAMMEE fieldread, fieldmake, fieldwrite, fieldfree - field access package SSYYNNTTAAXX #include "fields.h" typedef struct { int nfields; int hadnl; char *linebuf; char **fields; } field_t; #define FLD_RUNS 0x0001 #define FLD_SNGLQUOTES 0x0002 #define FLD_BACKQUOTES 0x0004 #define FLD_DBLQUOTES 0x0008 #define FLD_SHQUOTES 0x0010 #define FLD_STRIPQUOTES 0x0020 #define FLD_BACKSLASH 0x0040 extern field_t *fieldread (FILE * file, char * delims, int flags, int maxf); extern field_t *fieldmake (char * line, int allocated, char * delims, int flags, int maxf); extern int fieldwrite (FILE * file, field_t * fieldp, int delim); extern void fieldfree (field_t * fieldp); extern unsigned int field_line_inc; extern unsigned int field_field_inc; DDEESSCCRRIIPPTTIIOONN The fields access package eases the common task of parsing and accessing information which is separated into fields by whitespace or other delimiters. Various options can be specified to handle many common cases, including selectable delimiters, runs of delimiters, and quoting. _f_i_e_l_d_r_e_a_d reads one line from a file, parses it into fields as specified by the parameters, and returns a ffiieelldd__tt structure describing the result. _f_i_e_l_d_m_a_k_e per- forms the same process on a buffer already in memory. _f_i_e_l_d_w_r_i_t_e creates an output line from a ffiieelldd__tt structure and writes it to an output file. _f_i_e_l_d_f_r_e_e frees a ffiieelldd__tt structure and any associated memory allocated by the package. The ffiieelldd__tt structure describes the fields in a parsed line. A well-behaved should only access the nnffiieellddss, ffiieellddss, and hhaaddnnll elements; all other elements are used internally by the package and are not guaranteed to remain the same even though they are documented here. NNffiieellddss local 1 FIELDS(3) FIELDS(3) gives the number of fields in the parsed line, just like the aarrggcc argument to a C program; ffiieellddss is a pointer to an array of string pointers, just like the aarrggvv argument to a C program. As in C, the last field pointer is fol- lowed by a null pointer, although the field count is the preferred method of accessing fields. The user may alter nnffiieellddss by decreasing it, and may replace any pointer in ffiieellddss without harm. This is often useful in replacing a single field with a calculated value preparatory to out- put. The hhaaddnnll element is nonzero if the original line was terminated with a newline when it was parsed; this is used to accurately reproduce the input when _f_i_e_l_d_w_r_i_t_e is called. The lliinneebbuuff element contains a pointer to an internal buffer allocated by _f_i_e_l_d_r_e_a_d or provided to _f_i_e_l_d_m_a_k_e. This buffer is _n_o_t guaranteed to contain anything sensi- ble, although in the current implementation all of the field contents can be found therein. _f_i_e_l_d_r_e_a_d reads a single line of arbitrary length from ffiillee, allocating as much memory as necessary to hold it, and then parses the line according to its remaining argu- ments. A pointer to the parsed ffiieelldd__tt structure is returned, with NNUULLLL returned if an error occurs or if EEOOFF is reached on the input file. Fields in the input line are considered to be separated by any of the delimiters in the ddeelliimmss parameter. For example, if delimiters of ":.;" are specified, a line containing "a:b;c.d" would be con- sidered to have four fields. The default parsing of fields considers each delimiter to indicate a separate field, and does not allow any quoting. This is similar to the parsing done by _c_u_t(1). This behavior can be modified by specifying flags. Multiple flags may be OR'ed together. The available flags are: FFLLDD__RRUUNNSS Consider runs of delimiters to be the same as a single delimiter, suppressing all null fields. This is similar to the way utilities like _a_w_k(1) and _s_o_r_t(1) treat whitespace, but it is not limited to whitespace. A run does not have to consist of a single type of delimiter; if both semicolon and colon are delimiters, ";::;" is a run. FFLLDD__SSNNGGLLQQUUOOTTEESS Allow field contents to be quoted with single quotes. Delimiters and other quotes appearing within single quotes are ignored. This may appear in combination with other quote options. FFLLDD__BBAACCKKQQUUOOTTEESS Allow field contents to be quoted with reverse local 2 FIELDS(3) FIELDS(3) single quotes. Delimiters and other quotes appear- ing within reverse single quotes are ignored. This may appear in combination with other quote options. FFLLDD__DDBBLLQQUUOOTTEESS Allow field contents to be quoted with single quotes. Delimiters and other quotes appearing within double quotes are ignored. This may appear in combination with other quote options. FFLLDD__SSHHQQUUOOTTEESS Allow shell-style quoting. In the absence of this option, quotes are only recognized at the beginning of a field, and characters following the close quote are removed from the field (and are thus lost from the input line). If this option is specified, quotes may appear within a field, in the same way as they are handled by _s_h(1). Multiple quoting styles may be used in the same field. If none of FFLLDD__SSNNGGLLQQUUOOTTEESS, FFLLDD__BBAACCKKQQUUOOTTEESS, or FFLLDD__DDBBLLQQUUOOTTEESS is specified with FFLLDD__SSHHQQUUOOTTEESS, all three options are implied. FFLLDD__SSTTRRIIPPQQUUOOTTEESS Remove quotes and backslash sequences from the field while parsing, converting backslash sequences to their proper ASCII equivalent. The C sequences \a, \b, \f, \n, \r, \v, \x_n_n, and \_n_n_n are sup- ported. Any other sequence is simply converted to the backslashed character, as in _s_h(1). FFLLDD__BBAACCKKSSLLAASSHH Accept standard C-style backslash sequences. The sequence will be converted to an ASCII equivalent if FFLLDD__SSTTRRIIPPQQUUOOTTEESS is specified (q.v.). FFLLDD__NNOOSSHHRRIINNKK Don't shrink allocated memory using _r_e_a_l_l_o_c(3) before returning. This option can have a signifi- cant effect on performance, especially when _f_i_e_l_d_- _f_r_e_e is going to be called soon after _f_i_e_l_d_r_e_a_d or _f_i_e_l_d_m_a_k_e. The disadvantage is that slightly more memory will be occupied until the field structure is freed. The _m_a_x_f parameter, if nonzero, specifies the maximum num- ber of fields to be generated. This may enhance perfor- mance if only the first few fields of a long line are of interest to the caller. The actual number of fields returned is one greater than _m_a_x_f, because the remainder of the line will be returned as a single contiguous (and uninterpreted, FFLLDD__SSTTRRIIPPQQUUOOTTEESS or FFLLDD__BBAACCKKSSLLAASSHH is speci- fied) field. local 3 FIELDS(3) FIELDS(3) _f_i_e_l_d_m_a_k_e operates exactly like _f_i_e_l_d_r_e_a_d, except that the line parsed is provided by the caller rather than being read from a file. If the _a_l_l_o_c_a_t_e_d parameter is nonzero, the memory pointed to by the _l_i_n_e parameter will automati- cally be freed when _f_i_e_l_d_f_r_e_e is called; otherwise this memory is the caller's responsibility. The memory pointed to by _l_i_n_e is destroyed by _f_i_e_l_d_m_a_k_e. All other parame- ters are the same as for _f_i_e_l_d_r_e_a_d_. _f_i_e_l_d_w_r_i_t_e writes a set of fields to the specified _f_i_l_e, separating them with the delimiter character _d_e_l_i_m (note that this is a character, not a string), and appending a newline if specified by the _h_a_d_n_l element of the struc- ture. The field structure is not freed. _f_i_e_l_d_w_r_i_t_e will return nonzero if an I/O error is detected. _f_i_e_l_d_f_r_e_e frees the ffiieelldd__tt structure passed to it, along with any associated auxiliary memory allocated by the package (or passed to _f_i_e_l_d_m_a_k_e). The structure may not be accessed after _f_i_e_l_d_f_r_e_e is called. ffiieelldd__lliinnee__iinncc (default 512) and ffiieelldd__ffiieelldd__iinncc (default 20) describe the increments to use when expanding lines as they are read in and parsed. _f_i_e_l_d_r_e_a_d initially allo- cates a buffer of ffiieelldd__lliinnee__iinncc bytes and, if the input line is larger than that, expands the buffer in increments of the same amount until it is large enough. If input lines are known to consistently reach a certain size, per- formance will be improved by setting ffiieelldd__lliinnee__iinncc to a value larger than that size (larger because there must be room for a null byte). ffiieelldd__ffiieelldd__iinncc serves the same purpose in both _f_i_e_l_d_r_e_a_d and _f_i_e_l_d_m_a_k_e, except that it is related to the number of fields in the line rather than to the line length. If the number of fields is known, per- formance will be improved by setting ffiieelldd__ffiieelldd__iinncc to at least one more than that number. RREETTUURRNN VVAALLUUEESS _f_i_e_l_d_r_e_a_d and _f_i_e_l_d_m_a_k_e return NNUULLLL if an error occurs or if EEOOFF is reached on the input file. _f_i_e_l_d_w_r_i_t_e returns nonzero if an output error occurs. BBUUGGSS Thanks to the vagaries of ANSI C, the ffiieellddss..hh header file defines an auxiliary macro named PP. If the user needs a similarly-named macro, this macro must be undefined first, and the user's macro must be defined after ffiieellddss..hh is included. local 4